Skip to content

fix(tests): #34 race-free parallel cli tests via ctor + #[serial]#42

Closed
hanwencheng wants to merge 1 commit intomainfrom
fix/issue-34
Closed

fix(tests): #34 race-free parallel cli tests via ctor + #[serial]#42
hanwencheng wants to merge 1 commit intomainfrom
fix/issue-34

Conversation

@hanwencheng
Copy link
Copy Markdown
Member

Problem (issue #34)

Tests in crates/agentkeys-cli/tests/cli_tests.rs mutated process-global env vars (HOME, AGENTKEYS_SESSION_STORE) mid-test via unsafe { std::env::set_var(...) }. Under the default parallel test runner this raced — one test's set_var could stomp on another test's observed value.

Our CI hid this latent race by passing --test-threads=1 everywhere. Contributors who forgot the flag got flaky failures that looked like timing bugs but were actually HOME / SESSION_STORE stomping. Codex flagged the pattern on PRs #18, #19, #20, #27.

Issue: #34

Fix — two layers

1. AGENTKEYS_SESSION_STORE=file#[ctor] pre-main hook

Every test wants this set so session_store::save_session uses the file path instead of the OS keychain. A single #[ctor::ctor] fn now runs during dylib init, before any test thread starts. No race window where one test could hit the real keychain while another is still inside a set_var call.

#[ctor::ctor]
fn init_test_env() {
    unsafe { std::env::set_var("AGENTKEYS_SESSION_STORE", "file"); }
}

Replaces 9 scattered set_var sites (init_session_direct + every test that redeclared the env).

2. HOME#[serial] on the 4 tests that mutate it

The self-revoke / by-wallet revoke tests point HOME at their own tempdir to sandbox session_store::fallback_path(). Each picks a different tempdir. Annotated with serial_test::serial so they don't race each other — non-HOME tests still run fully parallel.

Tests annotated:

  • cmd_revoke_self_clears_local_session
  • cmd_revoke_with_own_wallet_clears_local_session
  • cmd_revoke_with_other_wallet_keeps_local_session
  • cmd_revoke_no_session_errors_cleanly

New dev-deps

  • serial_test = "3"#[serial] attribute macro
  • ctor = "0.2" — pre-main init hook

Test plan

Before (contract enforced by --test-threads=1):

cargo test -p agentkeys-cli  # races, hangs intermittently

After:

cargo test -p agentkeys-cli  # → 37 passed in 0.85s, no flags needed
  • Parallel run passes cleanly (37/37 in 0.85s)
  • cargo clippy -p agentkeys-cli -- -D warnings stays clean
  • No unrelated files in the diff (3 files: Cargo.toml, Cargo.lock, cli_tests.rs)

Closes #34.

Tests in \`crates/agentkeys-cli/tests/cli_tests.rs\` mutated process-global
env vars (\`HOME\`, \`AGENTKEYS_SESSION_STORE\`) mid-test. Under the default
parallel test runner this raced — one test's \`set_var\` could stomp on
another test's observed value. Our CI hid this by passing
\`--test-threads=1\` everywhere. Drops that requirement.

**\`AGENTKEYS_SESSION_STORE=file\` — \`#[ctor]\` pre-main hook**
Every test wants this set so \`session_store::save_session\` uses the file
path instead of the OS keychain. A single \`#[ctor::ctor] fn\` now runs
during dylib init, before any test thread starts. No race window where
one test could hit the real keychain while another is still inside a
\`set_var\` call.

**\`HOME\` — \`#[serial]\` on the 4 tests that mutate it**
The self-revoke / by-wallet revoke tests point HOME at their own tempdir
to sandbox \`session_store::fallback_path()\`. Each picks a different
tempdir. Annotated with \`serial_test::serial\` so they don't race each
other — non-HOME tests still run fully parallel.

**New dev-deps**
- \`serial_test = \"3\"\`
- \`ctor = \"0.2\"\`

Test:
\`cargo test -p agentkeys-cli\` → 37 passed in 0.85s, no
\`--test-threads=1\` required.

Closes #34.
hanwencheng added a commit that referenced this pull request Apr 16, 2026
Option 2 follow-up to #42.

Replaces the implicit `$HOME` / `AGENTKEYS_SESSION_STORE` env-var reads
at every call with an explicit `SessionStore` handle that owns both. Tests
construct one rooted at a per-test tempdir in file-only mode — no more
`unsafe { set_var }`, no `#[serial]`, no process-global races.

Changes
- agentkeys-core/src/session_store.rs: introduce `SessionStore` struct
  with `base_dir` + `KeyringMode`, expose `new`, `file_only`, `from_env`
  constructors and method-form save/load/clear/list/session_path.
  Existing free functions (`save_session`, `load_session`, `clear_session`,
  `fallback_path`, `list_fallback_session_ids`,
  `load_session_with_legacy_fallback`) are preserved as thin wrappers that
  delegate to `SessionStore::from_env()`, so the daemon and any external
  callers keep working unchanged. Internal `marker_path` and file-I/O
  helpers are now methods on `SessionStore`.
- agentkeys-core session_store tests: drop the `with_temp_home` mutex +
  `unsafe { set_var }` helper and rewrite to build a per-test
  `SessionStore::file_only(tmp)` directly. Fully parallel, fully hermetic.
- agentkeys-cli CommandContext: add optional `session_store_override`
  field and `.with_session_store(store)` builder. New `session_store()`
  accessor returns the override or `SessionStore::from_env()`. The five
  `session_store::*` call sites in `cmd_init`, `cmd_revoke` (x2),
  `cmd_recover`, and `CommandContext::load_session` now route through it.
- agentkeys-cli integration tests: delete all 13 `unsafe { set_var }`
  calls and all HOME mutation. Every test owns a `(SessionStore, TempDir)`
  pair and threads the store through context builders. Drops the flaky
  race surface (issue #34) by construction rather than by serialization.

Why this on top of #42
- #42 patched the symptom with `#[ctor]` + `#[serial]`, which works but
  still compiles `unsafe { set_var }` into the test binary and requires
  future HOME-touching tests to remember `#[serial]`. This refactor
  removes the env-mutation pattern entirely — tests can't reintroduce
  the race because the test helpers no longer expose HOME as a knob.
- Also unlocks a genuine product feature: callers can point the CLI at
  a non-default base directory without touching the process environment
  (future `AGENTKEYS_HOME` / `--data-dir` / container deploys).

Test plan
- `cargo test --workspace` — all 151 tests pass
- `cargo test -p agentkeys-cli` × 5 consecutive runs, parallel scheduler —
  37/37 each time, ~0.2–0.7s
- `cargo clippy -p agentkeys-core -p agentkeys-cli --tests` — no new
  lints (pre-existing warnings in unrelated files unchanged)

Closes #34.
hanwencheng added a commit that referenced this pull request Apr 16, 2026
…) (#43)

* refactor(session_store): thread base_dir + KeyringMode through API (#34)

Option 2 follow-up to #42.

Replaces the implicit `$HOME` / `AGENTKEYS_SESSION_STORE` env-var reads
at every call with an explicit `SessionStore` handle that owns both. Tests
construct one rooted at a per-test tempdir in file-only mode — no more
`unsafe { set_var }`, no `#[serial]`, no process-global races.

Changes
- agentkeys-core/src/session_store.rs: introduce `SessionStore` struct
  with `base_dir` + `KeyringMode`, expose `new`, `file_only`, `from_env`
  constructors and method-form save/load/clear/list/session_path.
  Existing free functions (`save_session`, `load_session`, `clear_session`,
  `fallback_path`, `list_fallback_session_ids`,
  `load_session_with_legacy_fallback`) are preserved as thin wrappers that
  delegate to `SessionStore::from_env()`, so the daemon and any external
  callers keep working unchanged. Internal `marker_path` and file-I/O
  helpers are now methods on `SessionStore`.
- agentkeys-core session_store tests: drop the `with_temp_home` mutex +
  `unsafe { set_var }` helper and rewrite to build a per-test
  `SessionStore::file_only(tmp)` directly. Fully parallel, fully hermetic.
- agentkeys-cli CommandContext: add optional `session_store_override`
  field and `.with_session_store(store)` builder. New `session_store()`
  accessor returns the override or `SessionStore::from_env()`. The five
  `session_store::*` call sites in `cmd_init`, `cmd_revoke` (x2),
  `cmd_recover`, and `CommandContext::load_session` now route through it.
- agentkeys-cli integration tests: delete all 13 `unsafe { set_var }`
  calls and all HOME mutation. Every test owns a `(SessionStore, TempDir)`
  pair and threads the store through context builders. Drops the flaky
  race surface (issue #34) by construction rather than by serialization.

Why this on top of #42
- #42 patched the symptom with `#[ctor]` + `#[serial]`, which works but
  still compiles `unsafe { set_var }` into the test binary and requires
  future HOME-touching tests to remember `#[serial]`. This refactor
  removes the env-mutation pattern entirely — tests can't reintroduce
  the race because the test helpers no longer expose HOME as a knob.
- Also unlocks a genuine product feature: callers can point the CLI at
  a non-default base directory without touching the process environment
  (future `AGENTKEYS_HOME` / `--data-dir` / container deploys).

Test plan
- `cargo test --workspace` — all 151 tests pass
- `cargo test -p agentkeys-cli` × 5 consecutive runs, parallel scheduler —
  37/37 each time, ~0.2–0.7s
- `cargo clippy -p agentkeys-core -p agentkeys-cli --tests` — no new
  lints (pre-existing warnings in unrelated files unchanged)

Closes #34.

* fix(session_store): drop public SessionStore::new to enforce isolation-by-construction (codex PR #43 [P2])

Codex /codex review on PR #43 flagged that `SessionStore::new(base_dir, KeyringMode::Auto)` was unsafe: keyring entries are keyed on `session_id` alone and ignore `base_dir`, so two stores at different roots sharing a `session_id` would silently alias through the OS keychain. The newly introduced API promised per-root isolation but the `Auto` path violated that promise.

Fix: remove the public `SessionStore::new(base_dir, mode)` constructor. The only public constructors are now:
- `SessionStore::file_only(base_dir)` — explicit custom root, always file-only, safe by construction
- `SessionStore::from_env()` — home-rooted (the single-root invariant the keyring namespace assumes), may select `KeyringMode::Auto`

Custom-root callers physically cannot opt into keyring aliasing. No callers (internal or external) used `::new` with `KeyringMode::Auto` and a custom `base_dir`, so no migration is needed.

Tests: `cargo test -p agentkeys-core -p agentkeys-cli` → 22 core + 37 cli, 0 failures, 0.78s.

Refs #34, addresses codex [P2] on PR #43.

* fix(session_store): return a real error from load_with_legacy_fallback fallthrough (claude PR #43 review #1)

The trailing `self.load(session_id)` at the end of `load_with_legacy_fallback` was redundant — the same call already failed at the top of the function, so re-running it produced the same error with no added context about which legacy fallbacks were tried.

Replace with `anyhow::bail!` that names the attempted session path, the keyring mode, and the fact that legacy fallbacks only trigger for `id="master"`. Debuggability win; same observable behavior (still returns Err).

Tests: `cargo test -p agentkeys-core -p agentkeys-cli` → 22 + 37, 0 failures.
@hanwencheng
Copy link
Copy Markdown
Member Author

closed by #43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tests: process-global HOME mutation races in parallel tokio::test (cli_tests.rs)

1 participant